library(maptpx)
## Loading required package: slam
library(qtlcharts)
library(data.table)

GTEx Thinned Data

We thinned the gtex data so that the number of reads per tissue sample and per gene is few and of the order of the single cell level data. In the Jaitin et al dataset, there were on average, 2540 reads on an average and in the full experiment with 4590 cells and 20190 genes, the toal number of reads recorded was 11658921. On the other hand, for the GTEx dataset, the corresponding number was way bigger. For the 16407 cis genes, there were 3.5e+11 reads and for each sample on an average, there were 41073880 reads, which is around \(10^4\) larger than the single cell experiment reads in the Jaitin dataset. So, we use two thinning thresholds, \(p=0.0001\) and \(p=0.00001\).

Gtex- thinned Structure (Thinning threshold 0.001)

Gtex- thinned Structure (Thinning threshold 0.0001)

How the Brain looks? (thinned at 0.001)

Correlation heatmap of brain after thinning

## Set screen size to height=700 x width=1000

Correlation heatmap of brain without admixture

## 
Read 60.9% of 16407 rows
Read 16407 rows and 8556 (of 8556) columns from 0.262 GB file in 00:00:05

What about the full thinned data??….admixture first

Thinning parameter again \(0.001\).

Full thinned data hierarchical clustering

## 
Read 60.9% of 16407 rows
Read 16407 rows and 8556 (of 8556) columns from 0.262 GB file in 00:00:04

Finally..Are there actually 2 clusters in Liver? (remember t-SNE!!)

## 
Read 60.9% of 16407 rows
Read 16407 rows and 8556 (of 8556) columns from 0.525 GB file in 00:00:05
## 
## Estimating on a 119 document collection.
## Fitting the 2 topic model.
## log posterior increase: 28.461, 22.884, 32.301, 56.253, 103.45, 158.217, 183.798, 197.604, 264.475, 318.423, 199.361, 161.729, 477.518, 201.435, 88.212, 37.586, 12.78, 13.029, done.

Does not look like any distinct clusters are present.

Single cell data analysis

Eberwine Data

Paper

mouse single cells

Satija Data

Paper

Zebrafish single cell data

Mouse cortex + hippocampus

Paper

## 
Read 50.0% of 19982 rows
Read 19982 rows and 3007 (of 3007) columns from 0.113 GB file in 00:00:10

The Structure plot for 10 clusters for the dataset is

Finer cell sub types in the Amit data